Integration of simulated annealing into pigeon inspired optimizer algorithm for feature selection in network intrusion detection systems

In the context of the 5G network, the proliferation of access devices results in heightened network traffic and shifts in traffic patterns, and network intrusion detection faces greater challenges. A feature selection algorithm is proposed for network intrusion detection systems that uses an improved binary pigeon-inspired optimizer (SABPIO) algorithm to tackle the challenges posed by the high dimensionality and complexity of network traffic, resulting in complex models, reduced accuracy, and longer detection times. First, the raw dataset is pre-processed by uniquely one-hot encoded and standardized. Next, feature selection is performed using SABPIO, which employs simulated annealing and the population decay factor to identify the most relevant subset of features for subsequent review and evaluation. Finally, the selected subset of features is fed into decision trees and random forest classifiers to evaluate the effectiveness of SABPIO. The proposed algorithm has been validated through experimentation on three publicly available datasets: UNSW-NB15, NLS-KDD, and CIC-IDS-2017. The experimental findings demonstrate that SABPIO identifies the most indicative subset of features through rational computation. This method significantly abbreviates the system’s training duration, enhances detection rates, and compared to the use of all features, minimally reduces the training and testing times by factors of 3.2 and 0.3, respectively. Furthermore, it enhances the F1-score of the feature subset selected by CPIO and Boost algorithms when compared to CPIO and XGBoost, resulting in improvements ranging from 1.21% to 2.19%, and 1.79% to 4.52%.


INTRODUCTION
As 5G networks continue to advance and the number of access devices increases, network traffic has also increased significantly.With higher bandwidth, lower latency, and greater connection density, 5G networks are more vulnerable to insidious and efficient network attacks.To address network security concerns, it is recommended to implement a network intrusion detection system (NIDS) (Tsai & Lin, 2010) on computer systems to scan for any signs of unauthorized intrusion.The connection of a large number of devices to the 5G network requires NIDS to be capable of handling such a large-scale operation.Nevertheless, network data is characterized not only by its substantial volume but also by its high dimensional nature (Ganapathy et al., 2013), resulting in prolonged model training times and diminished predictive performance (Hastie et al., 2009).Hence, the significance of feature selection algorithms in NIDS is self-evident (Alazab et al., 2012).Feature selection offers a means of identifying significant features and eliminating extraneous ones from a dataset.The objective is to choose the most indicative subset of features from the initial dataset, thereby reducing model complexity and enhancing predictive performance.Feature selection reduces model complexity, improves predictive performance, and enhances the accuracy and reliability of intrusion detection by minimizing false alarms and preventing missed alarms (Thakkar & Lohiya, 2022).NIDS that uses feature selection algorithms have been extensively researched and implemented.They are a critical technical tool for ensuring network security.
The aim of feature selection is to identify a subset of features that closely approximates the optimal feature subset within a reasonable timeframe.The inclusion of feature selection has greatly improved the effectiveness of NIDS, aiming to identify a more suitable solution rather than an optimal one.At present, bio-inspired algorithms utilizing feature selection techniques exhibit superior performance when compared to other methods.Bionic algorithms draw inspiration from the collective behaviors of various animals (such as fireflies, wolves, fish, and birds), and researchers have introduced diverse computational approaches to emulate these species' behaviors for problem optimization, known as foraging.These approaches include the Chaotic Firefly Algorithm, Grey Wolf Optimization, Artificial Fish Swarm Algorithm, and the Bird Swarm Algorithm, among others (Shoghian & Kouzehgar, 2012).Each member within a swarm intelligence algorithm embodies a potential solution, generating fresh individuals through continuous mutation and crossover.The Pigeon-Inspired Optimizer (PIO) algorithm is an emerging swarm intelligence algorithm, which has obvious advantages in global search ability, convergence speed and robustness compared with other swarm intelligence algorithms.
Effective feature selection algorithms can enhance the detection capabilities and efficiency of NIDS.Scientific and efficient decision-making in feature selection has emerged as a critical method to guarantee the operational security of networks.However, feature selection algorithms currently face several issues, including excessive feature pruning (Zhou et al., 2020), disregard for inter-feature correlations (Li et al., 2020), susceptibility to anomalous traffic, and difficulty in handling large datasets (Jaw & Wang, 2021).These challenges can lead to a decline in the model's generalization ability, increased complexity, and reduced stability, ultimately impacting the model's detection performance and efficiency (Rashid et al., 2022).To tackle the aforementioned issues, this article presents a feature selection algorithm for NIDS based on an improved binary pigeoninspired optimization algorithm, aiming to enhance the accuracy and efficiency of feature selection in the context of network intrusion detection.The goal is to reduce false positive rate and false negative rate in NIDS.This approach utilizes mutation and simulated annealing mechanisms during the map and compass operator phases to expand the search scope and prevent the feature subset from being stuck in local optima.Furthermore, it introduces a population decay factor in the landmark operator phase to control rapid population decline and regulate the algorithm's convergence rate.The article presents a method that selects the most representative feature subset through reasonable computation.This leads to a significant reduction in model training and testing times, while enhancing the model's detection rate and accuracy.The key contributions of this study include: (1) We conduct an investigation and analysis of existing NIDS feature selection algorithms, leading to the proposal of an improved NIDS feature selection algorithm based on enhancements to the binary PIO algorithm; (2) during the map and compass operator phase, a mutation mechanism is introduced to increase the diversity of the population, thereby expanding the search space of the algorithm.Additionally, a simulated annealing approach is incorporated to accept new solutions that are worse than the current solution with a certain probability, facilitating escape from local optima; (3) during the landmark operator phase, a population decay factor is proposed to dynamically adjust the population size for each iteration based on the fitness distribution of the population.The objective of this adjustment is to regulate the convergence speed of the algorithm; (4) the improved PIO algorithm was combined with a classifier and applied to NIDS.The algorithm was evaluated against state-of-the-art feature selection algorithms using datasets such as UNSW-NB15, NSL-KDD, and CIC-IDS-2017.
The remaining sections of this article are organized as follows."Related Work" provides an overview of previous related work conducted by other researchers.In "Continuous Pigeon Inspired Optimizer", we present the architecture and formulation description of the continuous PIO algorithm."Proposed Improvement of PIO" describes the model of the proposed feature selection algorithm and provides detailed information on the updating steps.In "Experiments and Results", we conduct simulation experiments and evaluate the performance of our approach.Finally, in "Conclusion", we conclude and discuss future research directions.

RELATED WORK
The classification performance of network intrusion detection system models is significantly constrained by the high dimensionality and sheer volume of network traffic data.In light of the increasing volume of data, researchers have investigated sample selection methods to enhance the efficiency of the training model process.Feature selection algorithms have been developed to tackle challenges associated with high data dimensionality, as well as the presence of irrelevant and redundant features (Alazab et al., 2012) in datasets.Feature selection is crucial for enhancing model performance by eliminating irrelevant and redundant information from the dataset.By selecting only the most significant features for model training, it helps prevent overfitting and reduces feature dimensionality, thereby improving the efficiency of model training and prediction processes.
Traditional feature selection algorithms can be categorized into three types: filtered, wrapper, and embedded methods (Di Mauro et al., 2021).Filtered feature selection operates independently of the classifier, while wrapper methods involve evaluating the classifier during feature selection.Embedded methods integrate feature selection directly into the training process of the classifier.Each type has distinct benefits and is appropriate for different situations depending on the specific needs of the task.Filtered algorithms are computationally efficient but do not guarantee optimal feature selection.Embedded algorithms perform feature selection during intrusion model training and are computationally expensive for large datasets.Conversely, wrapper algorithms exhibit higher accuracy than the previous two algorithms but are sensitive to the quality of the training data.Achieving high accuracy is crucial for NIDS, and training time for offline data is not a significant concern.Therefore, this article uses the wrapper algorithm as the preferred method for feature selection, as it has been shown to provide the best results.
Table 1 provides a summary of the performance of different feature selection methods on different datasets, categorizing them into filtering methods, embedding methods, and wrapping methods.It includes details such as the number of features selected, the detection rate, and the false alarm rate for each method on each dataset.This table serves as a comprehensive overview of how these methods perform in the context of feature selection for intrusion detection.

Filtered feature selection method
Filtered feature selection algorithms do not use explicit criteria to determine the size of the subset.Instead, they rank features based on various evaluation metrics and select the top N features with the highest scores.This selection process is based on the intrinsic characteristics of the dataset and does not consider feedback from classification results for the features already selected.By focusing on feature ranking and selection independently of the classification model, filtered feature selection algorithms aim to identify the most relevant features for the given dataset without being influenced by the performance of a specific classifier.Amiri et al. (2011) introduced a mutual information-based feature selection (MIFS) technique for NIDS.However, the accuracy of mutual information estimation may be compromised in scenarios with limited data, resulting in the identification of suboptimal sets of features.Ambusaidi et al. (2016) proposed a mutual information-based method to select optimal feature subset for classification from linear and nonlinear correlated data.The ARM feature selection model proposed by Moustafa & Slay (2017) focuses on enhancing detection performance by filtering out irrelevant features, retaining only significant ones, and leveraging association rule mining to identify feature combinations with strong correlations.The comprehensive results show that ARM effectively minimizes false alarms and significantly reduces processing time while maintaining accuracy.Stiawan et al. (2020) conducted experiments using the mutual information selection technique with a NIDS on 20% of the streams from the CIC-IDS-2017 dataset.By reducing the number of features selected, the accuracy decreased, but the execution time also decreased significantly.differential evolution (DBDE) and quadratic discriminant analysis (QDA) to accelerate the process of wrapper feature selection.This approach aims to swiftly identify the optimal prediction features with minimal dimensions, thereby reducing the computational time needed.The experimental results demonstrate that DBDE-QDA offers decreased computational costs and effectively shortens the classification algorithm's computational time for network intrusion detection systems (NIDS).However, it may lead to a slight reduction in the detection rate for certain intrusion detection dataset.

Embedded feature selection method
In conclusion, while existing feature selection algorithms have partially addressed the challenges of intrusion detection systems (IDS) in 5G network environments, they still suffer from issues such as excessive feature selection, disregard for feature correlations, sensitivity to abnormal traffic, and difficulty in processing large-scale data.These challenges can reduce the model's generalisation ability, increase complexity, and decrease stability, thereby affecting detection performance and efficiency.To address these challenges, this article proposes a feature selection method based on an improved binary pigeon swarm optimization algorithm.In comparison to existing methods, the proposed approach incorporates mutation and simulated annealing mechanisms in the map and compass operator stages.These modifications are designed to enhance population diversity, expand the search space, and facilitate the acceptance of new solutions that are worse than the current solution with a certain probability, thereby facilitating escape from local optima.Furthermore, the proposed algorithm incorporates a population attenuation factor in the landmark operator stage.This factor dynamically adjusts the population size of each iteration based on the fitness distribution of the population, thus controlling the algorithm's convergence speed.The objective is to achieve improvements in key performance indicators such as detection rate, false alarm rate, and processing time.

CONTINUOUS PIGEON INSPIRED OPTIMIZER
In 2014, Duan & Qiao (2014) researched pigeon behavior.They found that pigeons use geomagnetic cues and landmarks to navigate, determine direction, and find their nests.Based on these findings, the PIO algorithm was developed to imitate pigeons' migration behaviors and help find optimal solutions through communication and cooperation.The algorithm includes the map and compass operator phase and the landmark operator phase.

Map and compass operator phase
The map and compass operator phase emulates how the sun and geomagnetic forces influence pigeon navigation.Pigeons assess the sun's position and geomagnetic cues to make real-time adjustments to their flight direction and strategize optimal routes.As pigeons approach their destination, they rely less on solar and geomagnetic guidance.During this phase, each pigeon is characterized by its positional and velocity data.
The PIO algorithm defines V t i as the velocity of the (i)-th pigeon in the (t)-th iteration, and P t i as its position.In each iteration, every pigeon adjusts its position P t i and velocity V t i according to Eqs. ( 1) and ( 2) (Duan & Qiao, 2014): In Eq. ( 1), R represents the map and compass operator, t denotes the current number of iterations and the random function rand 2 ½0; 1, P global stands for the globally optimal position obtained by comparing the positions of all pigeons in (t À 1)-th iteration.

Landmark operator phase
The landmark operator mimics how navigational landmarks affect pigeons.Pigeons have the ability to rapidly store details about surrounding landmarks during navigation.As they approach the target location, pigeons rely on nearby landmarks to construct a mental map and fine-tune their position and speed in response to these landmarks until they reach the intended destination.If a pigeon is unfamiliar with the local landmark, it will adjust its flight based on the flight patterns of nearby pigeons that are familiar with the landmark.During the iterative process of the landmark operator phase, pigeons are eliminated based on their fitness disparity, removing the less adapted half of the pigeons.The central position of the remaining, more adept pigeons is then computed as the reference direction within the population.The position of the pigeon is updated at this phase based on Eqs.
(3)-( 5) (Duan & Qiao, 2014). Fitness The iteration of the center position of the pigeon group can be denoted by Eq. ( 3), where Num t pigeon denotes the quantity of pigeon groups in the (t)-th iteration, t signifies the present iteration number, and the fitness function Fitness adopts distinct valuation methods for various issues.In instances where the aim is to minimize a problem, involving the reciprocal, P tÀ1 center represents the position of the pigeon center (desired destination) in the (t À 1)-th iteration.
Among them, the sorting function sort represents sorting the pigeon group according to adaptability.The iterative decay of the population can be described by Eq. (4).
Equation ( 5) describes how the remaining flock adjusts its position relative to the center position of the flock by incorporating the random function rand 2 ½0; 1.
The PIO algorithm is logically coherent, easy to understand, robust, and has significant research implications.The PIO algorithm has been shown to be effective in addressing various challenges, including the unmanned aerial vehicle path planning dilemma (Yuan & Duan, 2024), the security concerns associated with medical image encryption (Geetha et al., 2022), and the optimization of large-scale hydroelectric short-term generation (Tian et al., 2020).While the PIO algorithm exhibits superior performance compared to other population intelligence algorithms, it still suffers from drawbacks such as rapid convergence and a tendency to explore.To address the issue of rapid iteration and susceptibility to local optima in the PIO algorithm, this study introduces mutation and simulated annealing techniques to broaden the search scope.Additionally, a population decay factor is suggested to regulate the algorithm's convergence rate, thereby enhancing its overall performance, diminishing feature selection data dimensionality, and boosting the efficiency of intrusion detection.

PROPOSED IMPROVEMENT OF PIO
This article proposes a method to integrate the Simulated Annealing into the Binary PIO (SABPIO) algorithm for feature selection in NIDS.The approach incorporates simulated annealing and mutation into the conventional PIO algorithm, expanding the search scope and mitigating the risk of local optima.Additionally, a population decay factor is introduced to regulate the algorithm's convergence speed.The proposed SABPIO feature selection algorithm is shown in Fig. 1.
The proposed method generates the initial positions of the pigeons by utilizing randomly chosen features from the dataset, establishing the initial population.Decision tree (DT) and Random Forest (RF) classifiers are used to determine the search subject, which is the position of the pigeon closest to the target.These classifiers evaluate the fitness of each pigeon position within the population, and the positions of the remaining pigeons are adjusted based on the optimal solution.Following this, the pigeon swarm undergoes probabilistic positional adjustments utilizing simulated annealing.This mechanism aids in steering clear of local optimal solutions and enhances solution diversity within the search process.Finally, the population attenuation factor is used to decrease the pigeon population, which improves the exploration of solutions within the search space.The output of each iteration serves as the input for the following iterations until the optimal feature subset is identified.

Pigeon encoding
The pigeon position symbolizes the potential selection for features, with a single pigeon representing a particular feature subset.As shown in Fig. 2, the upper vector in the encoding denotes the feature's order number (dimension), while the lower vector indicates the pigeon's binary position within each dimension.The spatial dimension denoted by d explored by the pigeon corresponds to the quantity of network features.In the binary vector P i ¼ ðp i1 ; p i2 ; …; p id Þ, when p ij ¼ 1, it signifies that feature j within the feature subset represented by pigeon i is chosen.Conversely, when p ij ¼ 0, it indicates that feature j in the feature subset represented by pigeon i is not selected, meaning it is excluded from the optimal feature subset.

Fitness function
The Fitness Function evaluates the fitness of every individual.It is formulated considering the individual's traits and the specifications of the given problem, converting the individual into a numerical value that reflects their suitability for problem-solving.Given that the two metrics of true positive rate (TPR) and false negative rate (FPR) serve as effective gauges for assessing the model's efficacy in identifying attacks and managing false positives in routine activities, a majority of researchers opt to employ TPR and FPR (Thakkar & Lohiya, 2023) as the fitness criteria (Louk & Tama, 2023) in their calculations.
Equations ( 6) and ( 7) provide the calculation formulas for TPR and FPR.In the feature selection problem of a NIDS, TP refers to the system identifying abnormal traffic as attack events, and TN refers to the system identifying normal traffic as non-attack events.FP refers to the system identifying correct traffic as an attack event, and FN refers to the system identifying abnormal traffic as a non-attack event.The SABPIO algorithm incorporates the ratio of selected features into the fitness function to account for their potential impact on intrusion detection time.This adjustment aims to eliminate features within the subset that do not significantly contribute to detection accuracy.The present study also introduces a fitness function formula, shown in Eq. ( 8), which reframes the optimization of feature selection as a minimization task.
where Num SF denotes the number of selected features, k is the weighting factor and k 2 ð0; 1Þ.In Eq. ( 8), the numerator considers the impact of the selected feature quantity on adaptability, while the denominator accounts for the NIDS performance's influence on adaptability.Through the Fitness Function, the SABPIO algorithm strikes a balance between feature quantity and classification performance, effectively enhancing classification efficiency while ensuring the accuracy of NIDS detection.

Binary mapping strategy
The continuous pigeon-inspired optimizer (CPIO) algorithm involves a process of continual spatial repositioning for the pigeon, enabling it to traverse any point within space.However, in certain discrete scenarios such as feature selection, the pigeon's position, representing a solution matrix, consists of binary values of 0 and 1.Therefore, updating continuous values requires the application of appropriate position adjustment techniques in addition to discretization operations.
In the context of feature selection, the pigeon's position within each dimension of the search space is constrained to 0 or 1.However, the velocity associated with each dimension is not subject to such limitations.Therefore, the integration of a conversion function becomes essential to effectively map the position variables onto binary values.After conducting experimental analysis, we selected the Tanh function to map pigeon velocities into the binary space.The Tanh function formula (Sood et al., 2023) is shown in Eq. ( 9), and the positions of the individual pigeon flocks are updated using a uniform random number r 2 ð0; 1Þ, with the Tanh value through Eq. ( 10) in this article.
The individual pigeon's position is updated according to Eq. ( 10).In this process, for each dimension of the position, the velocity is evaluated against a randomly generated number r and the current dimension's pigeon velocity.In instances where the mapping value Tanh ðV t i ½ jÞ > r in the ongoing velocity iteration, there is a strong positive correlation between velocity and position, the position from the previous iteration in the current dimension is preserved.If Tanh ðV t i ½ jÞ < À r, there is a strong negative correlation between velocity and position, a reverse operation is applied to the position from the prior iteration in the current dimension.In all other scenarios where there is a weak correlation between velocity and position, the optimal position value from the previous iteration is directly utilized.

Improved map and compass operator phase
(1) Simulated annealing To tackle the issue of rapid convergence observed in conventional PIO algorithms, the proposed approach introduces a simulated annealing mechanism during this phase to prevent premature trapping in local optimal solutions.In the map and compass operator phase, each iteration of the pigeon undergoes adjustments to both velocity and position.The influence of the map and compass operators R on the population decreases as the algorithm approaches later stages of iteration.At this point, the algorithm relies mainly on the current globally optimal position P global .
This approach integrates simulated annealing to enhance the inner loop with each iteration.During the loop, a random pigeon undergoes perturbation, resulting in the modification of one value in the vector, such as changing a 1-value to a 0-value.Then, the fitness is recalculated, and a new feature subset is accepted based on the probability determined by the Metropolis criterion (Hao et al., 2023).The purpose of this criterion is to determine whether to accept a new state based on the change in energy value before and after the state modification.The study employs the Metropolis criterion outlined in Eq. ( 11): where pðP global ) P i 0 Þ represents the probability of accepting the new solution P i 0 , while DE represents the energy difference, denoted as Fitness ðP 0 i Þ À Fitness ðP global Þ in this context.In this article, the fitness is transformed into a minimization problem.If the fitness of the new solution P i 0 is lower than the fitness of the globally optimal solution P global , implies that the feature subset represented by the new solution P i 0 is superior to the globally optimal solution P global , P i 0 is accepted as the current globally optimal solution with a probability of 1. Conversely, the probability p ðP global ) P i 0 Þ is used to determine whether the new solution should be accepted.
(2) Multi-dimensional similarity strategy During the map and compass operator phase, the white pigeon adjusts its flight position by tracking the position of the best pigeon (blue pigeon), as shown in Fig. 3.
Instead, the pigeon computes its velocity by subtracting its own position vector from the global optimal vector.In a discrete problem, it is not feasible to directly subtract the pigeon's position vector as done in the continuous problem due to the nature of discrete variables.This article introduces a multi-dimensional similarity strategy for computing pigeon velocities.The strategy includes metrics such as Pearson correlation coefficient (Saviour & Samiappan, 2023), cosine similarity (Alazzam, Sharieh & Sabri, 2020), and Jaccard similarity coefficient (Yin et al., 2023), as shown in Eqs. ( 12)-( 14).All three similarity indicators have limitations.To balance these limitations, this article employs weighted calculations to avoid relying on a single indicator.In this phase, each pigeon updates its velocity and position for this iteration based on Eqs. ( 15) and ( 10).
Equation ( 15) requires normalizing Jaccard's correlation coefficient to the range of [−1, 1], given that Pearson's correlation coefficient and Cosine similarity have a value range of [−1, 1], and Jaccard's similarity coefficient ranges from [0, 1].In this, x 1 , x 2 and x 3 represent the weighting coefficients of Pearson's correlation coefficient, cosine similarity, and Jaccard's similarity coefficient, respectively, with (3) Mechanism of mutation When the initial number of pigeon flocks is high, there is a greater chance that two pigeons will represent the same solution, which will reduce the search ability of the algorithm.Therefore, this approach includes a mutation mechanism in the flock's position updates.It checks for the existence of a solution with a similar position before adding the updated pigeon to the flock.If such a solution is found, all dimensions of the current pigeon undergo mutation based on the probability derived from a uniformly distributed random number r 2 ½0; 1, which expands the search space.

Improved landmark operator phase
During each iteration of the landmark operator phase, the pigeons are sorted based on their fitness values.Then, half of the pigeons with lower fitness values are eliminated.The current center position of the remaining dominant breeder flock is considered the desired destination of the flock.The remaining flock adjusts its flight position towards the position of the desired destination, also known as the Blue Pigeon, as shown in Fig. 4. The population decay factor a was proposed to regulate the decay rate of the population, as the traditional pigeonholing algorithm tends to converge too quickly and fall into local optimal solutions during the landmark operator phase.
Equation ( 16) defines b as a constant between (0, 1), and t as the number of iterations of the landmark operator.The SABPIO algorithm improves the traditional pigeon colony algorithm by halving it with the number of iterations and updating the population size according to Eq. ( 17), thus prevent rapid population decay in the early stage.During the map and compass operator phase, all pigeons calculate their speed and position using Eqs.( 15) and (10).
Algorithm 1 outlines the procedure for the feature selection algorithm SABPIO, which is based on an improved binary PIO framework.The upgraded algorithm introduces a simulated annealing loop within the initial phase of the main iteration loop to extend the exploration of the global search space.Additionally, the secondary while loop uses a population decay factor to regulate the pace of population reduction and mitigate premature convergence of the algorithm.

Experimental dataset
(1) UNSW-NB15 dataset The UNSW-NB15 dataset represents network traffic data collected by a cybersecurity research laboratory in Australia utilizing the IXIA Perfect Storm tool.It comprises four CSV files encompassing 254,047 data entries, featuring nine attack classifications, 43 descriptive attributes, and two classification labels for each entry.The detailed feature attributes of this dataset are outlined in Table 2.
(2) NSL-KDD dataset The NSL-KDD dataset serves as an updated iteration of the renowned KDD99 dataset, comprising 148,517 entries.Each entry is composed of 41 descriptive attributes and one class label, encompassing a total of 39 attack classifications.Within the training set, there  3.
(3) CIC-IDS-2017 dataset The CIC-IDS-2017 dataset encompasses network traffic data gathered by the Canadian Institute of Cybersecurity (CIC) from authentic network scenarios.This dataset is constructed using actual network traffic captures, encompassing a broad spectrum of network intrusions and regular activities.It comprises real network traffic observed across various laboratory network environments designed to replicate the network traits found in commercial and industrial entities.The CIC-IDS-2017 dataset encompasses a diverse array of network intrusion behaviors and standard network operations.It aligns with the traits of contemporary networks and stands as one of the presently recommended datasets.Featuring 2,830,743 records, each entry comprises 78 defining attributes and one class label, covering a total of eight attack classifications.The detailed characteristics of the dataset are presented in Table 4.   (4) Data preprocessing 1) Data conversion During the algorithm's execution, solely numerical data is utilized for training and testing purposes.Hence, the initial step involves transforming non-numeric data within the dataset into numerical format.Taking the UNSW-NB15 dataset as a case in point, out of the 45 descriptive attributes, three are non-numeric and necessitate conversion via onehot encoding.For instance, consider the "proto" attribute containing 133 distinct values such as "tcp," "udp," and "sctp."These values are encoded into numerical representations ranging from 0 to 132.Subsequently, the "service" and "state" attributes undergo a similar transformation into numerical format utilizing the aforementioned method.
2) Normalized Normalization is a crucial data preprocessing technique that facilitates the comparison and analysis of data by standardizing data with varying scales and distributions onto a uniform scale (Devendiran & Turukmane, 2024).This process enhances the accuracy and efficiency of data analysis and machine learning algorithms while mitigating biases stemming from variations across different variables.The normalization formula, as illustrated in Eq. ( 18), plays a pivotal role in this standardization process.
where x max is the maximum of the eigenvalues, x min is the minimum of the eigenvalues and x norm is the output value which is between [0, 1].

Evaluation indicators
Multiple metrics are available for evaluating feature selection algorithms.In this study, we use evaluation metrics derived from the confusion matrix, including detection rate (DR), false alarm rate (FAR), Accuracy (Acc), Precision (Pre), and F1-score (Thakkar & Lohiya, 2023), as shown in Eqs. ( 19)-( 23).Table 5 shows the confusion matrix.(1) Detection rate (DR) The detection rate, also known as the true positive rate (TPR), signifies the capacity to accurately recognize all true positive samples.It represents the proportion of positive samples correctly identified by the model.In the realm of network intrusion detection, it signifies the percentage of intrusion events effectively identified by the model.A heightened detection rate implies the model's enhanced ability to accurately identify potential intrusions.
(2) False alarm rate (FAR) The false alarm rate, denoted as the false positive rate (FPR), represents the ratio of negative samples that the model inaccurately identifies as positive samples.In the context of network intrusion detection, it signifies the percentage of normal behaviors erroneously identified as intrusions by the model.A diminished FPR indicates the model's efficacy in minimizing false alarms.
(3) Accuracy Accuracy refers to the proportion of correctly predicted samples by the model, serving as a measure of the model's overall predictive precision.
(4) Precision Precision is the ratio of correctly identified positive samples by the model.In the context of network intrusion detection, it reflects the accuracy of the model in identifying all samples flagged as intrusions.Enhanced precision indicates greater reliability of the model in reporting alarms and a reduced occurrence of false alarms.(5) F1-score F1-score is a measure that assesses the balance between precision and recall while taking into account the model's accuracy and comprehensiveness.It is calculated as the harmonic average of precision and recall.

Experimental results
In this study, we conducted experiments to evaluate the proposed approach utilizing Python 3.11.2 on 64-bit Windows 11 operating system.The were carried out on Intel(R) Core (TM) i5-11400H processor with 16.00 GB of RAM.All feature selection algorithms were assessed using the decision tree (DT) classifier and Random Forest (RF) classifier from the scikit-learn library for evaluation purposes.Compared to alternative base classifiers, DT and RF are less sensitive to missing values and more robust against outliers and noise.This makes them well-suited for assessing feature selection issues.Table 6 delineates the parameter configurations of the SABPIO algorithm.Through rigorous experimental analysis, we found that the performance of the algorithm is optimal when the number of individuals in the pigeon swarm is within the range of [80,150].Consequently, we set the number of pigeons to 128.It's important to note that while a larger number of pigeons allows for a broader exploration of the search space, it also increases computational complexity.In the inner loop of the annealing iterations, only the new solution and the current local optimal solution are compared.As such, the number of iterations has a minimal impact on time complexity.Therefore, we set the number of iterations in the simulated annealing inner loop to 100.In the calculation of the fitness function, we considered the number of feature selections.If the weight factor is too large, the pigeon swarm may overly pursue feature subsets with fewer elements rather than optimal feature subsets.To prevent the fitness calculation from ignoring the impact of TPR and FPR, we set the weight factor of the number of feature selections to 0.0075.
(1) Results of UNSW-NB15 The performance, convergence, and efficiency of SABPIO were compared to those of CPIO, SPIO, XGBoost, PSO, and ARM algorithms using the UNSW-NB15 dataset.Figure 5 shows the convergence curves of SABPIO, CPIO, SPIO, and PSO during the feature selection process under the random forest classifier.In "Proposed Improvement of PIO" of the article, fitness is defined as a minimization problem.The data suggests that SABPIO converges faster than SPIO and PSO algorithms and achieves better fitness values with each iteration compared to CPIO, SPIO, and PSO algorithms.
In Fig. 5, it is evident that the SABPIO algorithm exhibits the swiftest rate of adaptation decay within the initial 30 iterations.Conversely, the SPIO algorithm showcases a rapid decay rate within the first 10 iterations; however, subsequent iterations reveal that the SPIO algorithm becomes ensnared in a locally optimal solution, impeding the exploration for a superior solution.By the 50th iteration, the SABPIO algorithm embraces a suboptimal solution based on the Metropolis criterion probability, resulting in a slight fitness increase.By the time all algorithms reach convergence at 100 iterations, it is apparent that the solution derived by SABPIO demonstrates reduced adaptation.The experimental findings affirm that SABPIO boasts enhanced convergence efficiency compared to SPIO and PSO, along with greater efficacy in the selected feature subset than CPIO, SPIO, and PSO.In Fig. 6, the detection rate (DR) and false alarm rate (FAR) of the SABPIO algorithm, assessed on DT and RF classifiers with a subset of features selected by other algorithms, are presented.Each bar in the figure represents the results and standard deviation obtained from 100 repeated runs of the feature subset selected by the algorithm in the DT and RF classifiers.In Fig. 6A, it is observed that SPIO achieves the highest DR among the DT classifiers, slightly surpassing the performance of the SABPIO algorithm.However, it is crucial to acknowledge that DR is not the sole metric utilized in this study for evaluating the feature subset in network intrusion detection.Moving on to Fig. 6B, SABPIO exhibits a 2% lower FAR compared to SPIO, while only experiencing a marginal 0.2% reduction in DR.In comparison to CPIO, XGBoost, PSO, and ARM algorithms, the proposed SABPIO algorithm demonstrates advantages in both DR and FAR.Notably, the ARM algorithm prioritizes a high detection rate as the optimization objective, neglecting the impact of the false alarm rate on NIDS, thereby resulting in an elevated false alarm rate within this dataset.Within the RF classifiers, SABPIO showcases more significant improvements than other algorithms.The mean performance of the proposed algorithm across 100 repeated experiments surpasses that of other algorithms, with the standard deviation consistently maintained at a low level.
Figure 7 displays the accuracy and precision test results for the six algorithms on UNSW-NB15.Each bar represents the result and standard deviation obtained from 100 repeated runs of the feature subset selected by the algorithm using the DT and RF classifiers.The experimental results show that the SABPIO algorithm improves the accuracy rate by 0.12% to 4.89% and the precision rate by 0.19% to 5.98% compared to other algorithms, for an equivalent number of iterations.It is important to consider both accuracy and precision rates, as well as other relevant indicators.The PSO and ARM algorithms are highly accurate but have low precision, indicating a higher likelihood of false predictions in samples identified as cyber-attacks.This tendency often leads to higher misclassification rates within the models, resulting in more instances of misclassifying normal traffic as attacks.
Figure 8 shows the mean and standard deviation of the F1-Score from 100 repeated experimental tests for the feature subsets selected by the SABPIO algorithm and other algorithms using the DT and RF classifiers.The results indicate that the F1-score achieved by SABPIO is 0.920 in the DT classifier and 0.927 in the RF classifier, demonstrating superior performance compared to the other five algorithms.Furthermore, the lower standard deviation highlights the improved performance of the SABPIO algorithm, indicating better consistency and stability.The selected feature subset demonstrates superior feature representation capabilities and heightened performance stability.
Figure 9 shows a comparison of training and testing times before and after feature selection for different feature subsets selected by various feature selection algorithms on UNSWNB15.The results demonstrate that the number and quality of features have a significant impact on the model's training and testing time.The SABPIO feature selection algorithm can significantly reduce model training time and improve efficiency without compromising detection results.The experiment evaluated the training time using the RF classifier.The training time of the RF classifier before feature selection was 1.21 s.After SABPIO feature selection, the training time reduced to 0.29 s, which is about 3.2 times faster than using all the features.Additionally, the testing time decreased from 0.096 to 0.068 s.
(2) Results of NSL-KDD The NSL-KDD dataset was utilized to evaluate the detection performance of various algorithms including SABPIO, CPIO, SPIO, IG, PSO, and ARM. Figure 10 illustrates the DR and FAR of 100 repeated tests on DT and RF classifiers using the SABPIO algorithm and the feature subset selected by the recent feature algorithm.As shown in Fig. 10A, SABPIO outperforms the other DT classifiers with a DR of 90.2% (±1.3%), which is an improvement of approximately 3.6% compared to the next best CPIO.In Fig. 10B, the SABPIO algorithm prioritizes the balance of DR and FAR.Although the FAR is slightly higher compared to other algorithms such as SPIO, IG, PSO, and ARM, it still has a significant difference with the proposed algorithms in terms of DR.Similar to the DT classifier experiments, the RF classifier using the SABPIO algorithm showed significantly better DR means than the other algorithms in 100 repetitions, with slightly higher FAR.The proposed algorithm also maintained a low standard deviation, demonstrating the robustness and interpretability of the selected feature subset.
Figure 11 displays the results of 100 repeated experiments on NSL-KDD for the feature subsets selected by six feature selection algorithms.The experimental results indicate that the SABPIO algorithm outperforms the other five algorithms in terms of accuracy and precision, achieving 90.6% (±0.7%) and 91.5% (±0.6%) respectively, under the same number of iterations, whether using a DT classifier or an RF classifier.The accuracy of the other algorithms was 87.6% (±1.4%) and 83.2% (±1.2%).
Figure 12 displays the mean and standard deviation of the F1-Score from 100 repeated experimental tests on the DT and RF classifiers for the feature subsets selected by the  99.897% in DT and RF classifiers, respectively.It is important to note that all evaluations are objective and based on empirical evidence.Figure 13B indicates that SABPIO is at the optimal level of FAR in both classifiers, except for a slightly higher FAR in the DT classifier compared to the IG algorithm.Figure 14 displays the accuracy and precision test results of the feature subset selected by six feature selection algorithms on 20% of CIC-IDS-2017, repeated 100 times.The experimental results indicate that, under the same number of iterations, the SABPIO algorithm outperforms the other five algorithms in terms of accuracy and precision for both DT and RF classifiers, achieving 99.72%, 99.80%, and 99.38%, respectively, with an overall accuracy of 99.44%.
Figure 15 displays the mean and standard deviation of the F1-score from 100 repeated experimental tests on DT and RF classifiers for the feature subsets selected by the SABPIO

CONCLUSION
Network intrusion detection detects attacks by monitoring traffic.However, the large volume and high dimensionality of network data pose challenges to intrusion detection.Redundant and irrelevant features seriously affect detection performance.To address these, by incorporating mutation and simulated annealing into the map and compass operator, as well as introducing a population decay factor in the landmark operator phase.
Experimental results indicate that the SABPIO algorithm effectively improves the detection rate and reduces false alarms, as well as training time.
However, it should be noted that SABPIO is subject to limitations that depend on the quality and completeness of the data.In the event that there are a significant number of missing or outlier values in the dataset, SABPIO may not be able to achieve optimal performance.In our future work, we will investigate how to improve the SABPIO algorithm to handle incomplete data.Meanwhile, the number of network attack samples and normal traffic samples is unbalanced in the actual network environment, so in further research, we will consider the impact of sample distribution imbalance on the feature selection algorithm.
Embedded feature selection algorithms are integrated with the machine learning model training process in a seamless manner.This approach offers the advantage of performing feature selection and model training simultaneously, resulting in optimized performance in both aspects.Embedded feature selection algorithms view the feature selection process as an integral part of model training.Feature weights are assigned concurrently with model training, all within a unified framework.Yulianto, Sukarno & Suwastika (2019) sought to enhance machine learning-based NIDS by incorporating principal component analysis and ensemble feature selection techniques for feature selection.Due to AdaBoost's high

Table 1
Summary of related works.
(WOA)to address their respective limitations through hybridization.The experimental findings indicate that when combined with the Artificial Neural Network Weighted Random Forest (AWRF), the OWSA achieved an accuracy of 99.92% on the NSL-KDD dataset and 98% on the CICIDS2017 dataset.Zorarpaci (2024) presented a rapid wrapper feature selection technique, termed DBDE-QDA, which integrates two-class binary Algorithm 1 Simulated Annealing Binary Pigeon Inspired Optimizer (SABPIO).Input: Number of pigeons Num pigeon , Number of iterations Num t , Fitness function Fitness, Number of annealing iterations Num at Check for duplicate items at each pigeon's position ½P 1 ; P 2 ……::; P Num pigeon 05: Calculate fitness Fitness P i ð Þ of each pigeon's position ½P 1 ; P 2 ……::; P Num pigeon 06: Find global optimal solution P global ¼ minfFitnessðP i Þ j i 2 ½0; Num pigeon g pigeon ! 1) // Landmark operator phase 14: Update center position of all pigeons P center by Eq. (3) 15: Update number of pigeons Num pigeon by Eq. (17) global Huang et al. (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.217615/32 are 125,973 data points encompassing 22 distinct attack types, while the test set consists of 22,544 entries featuring a further 17 attack categories.The defining attributes within this dataset are detailed in Table

Table 2
UNSW-NB15 dataset features and types.

Table 3
NSL-KDD dataset features and types.

Table 6
Detailed parameters of SABPIO.